Python: Add self-validating CFG tests by tausbn · Pull Request #21724 · github/codeql

tausbn · 2026-04-16T16:15:39Z

This PR implements an idea I've had for a while, wherein we produce a comprehensively annotated suite of Python CFG tests that are self-validating -- executing the code proves the annotations are correct -- and then use these to validate /guide the control-flow implementation in QL.

For the avoidance of doubt, Copilot (with guidance) produced all of the test code in this PR. In particular, the self-validating aspect is very useful in this respect, as it provides a convenient feedback loop for the agent.

While the present set of tests take full advantage of Python's operator overloading (and the fact that the "matrix multiplication" operator @ has no inherent semantics), this approach could be easily adapted to many other languages.

These tests consist of various Python constructions (hopefully a somewhat comprehensive set) with specific timestamp annotations scattered throughout. When the tests are run using the Python 3 interpreter, these annotations are checked and compared to the "current timestamp" to see that they are in agreement. This is what makes the tests "self-validating". There are a few different kinds of annotations: the basic `t[4]` style (meaning this is executed at timestamp 4), the `t[dead(4)]` variant (meaning this _would_ happen at timestamp 4, but it is in a dead branch), and `t[never]` (meaning this is never executed at all). In addition to this, there is a query, MissingAnnotations, which checks whether we have applied these annotations maximally. Many expression nodes are not actually annotatable, so there is a sizeable list of excluded nodes for that query.

These use the annotated, self-verifying test files to check various consistency requirements. Some of these may be expressing the same thing in different ways, but it's fairly cheap to keep them around, so I have not attempted to produce a minimal set of queries for this.

This one demonstrates a bug in the current CFG. In a dictionary comprehension `{k: v for k, v in d.items()}`, we evaluate the value before the key, which is incorrect. (A fix for this bug has been implemented in a separate PR.)

This looks for nodes annotated with `t[never]` in the test that are reachable in the CFG. This should not happen (it messes with various queries, e.g. the "mixed returns" query), but the test shows that in a few particular cases (involving the `match` statement where all cases contain `return`s), we _do_ have reachable nodes that shouldn't be.

This one is potentially a bit iffy -- it checks for a very powerful property (that implies many of the other queries), but as the test results show, it can produce false positives when there is in fact no problem. We may want to get rid of it entirely, if it becomes too noisy.

Copilot

Pull request overview

This PR adds a new, self-validating Python evaluation-order test suite for control-flow (CFG) work: Python source files embed timestamp annotations that can be validated by executing the files, and a set of QL queries then uses those annotations to validate/guide the CFG implementation.

Changes:

Introduces a timer.py harness (@test, dead(...), never, and @-based annotations) to self-validate evaluation order by running the Python test files.
Adds a broad set of Python evaluation-order test files covering core expressions, branching, loops, exceptions, async/await, generators/yield, match/case, comprehensions, etc.
Adds QL/QLL utilities and queries to locate annotations in the AST and check CFG properties against those annotations (with expected outputs where appropriate).

Show a summary per file

File	Description
python/ql/test/library-tests/ControlFlow/evaluation-order/TimerUtils.qll	QLL utility for identifying timer annotations and exposing CFG-oriented predicates.
python/ql/test/library-tests/ControlFlow/evaluation-order/timer.py	Python runtime harness for self-validating timestamp annotations.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_basic.py	Self-validating tests for basic expression evaluation order and statement sequencing.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_boolean.py	Self-validating tests for short-circuit boolean operator evaluation and control flow.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_if.py	Self-validating tests for if/elif/else control-flow evaluation order.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_conditional.py	Self-validating tests for ternary conditional expression evaluation order.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_loops.py	Self-validating tests for while/for loops, break/continue, and loop else-clauses.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_try.py	Self-validating tests for try/except/else/finally evaluation order.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_assert_raise.py	Self-validating tests for assert/raise (including raise-from and reraises).
python/ql/test/library-tests/ControlFlow/evaluation-order/test_with.py	Self-validating tests for with-statement evaluation order.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_unpacking.py	Self-validating tests for unpacking assignment and star-unpacking evaluation order.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_augassign.py	Self-validating tests for augmented assignment evaluation order.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_functions.py	Self-validating tests for function calls, defaults, decorators, and argument evaluation.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_lambda.py	Self-validating tests for lambda creation/call evaluation order and closures.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_classes.py	Self-validating tests for class definitions, decorators, instantiation, and method calls.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_comprehensions.py	Self-validating tests for list/set/dict comprehensions and generator expressions.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_yield.py	Self-validating tests for generators, yield/yield-from, send(), and generator expressions.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_async.py	Self-validating tests for async/await, async-for, async-with, and asyncio constructs.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_fstring.py	Self-validating tests for f-string evaluation order.
python/ql/test/library-tests/ControlFlow/evaluation-order/test_match.py	Self-validating tests for match/case evaluation order (with version guard).
python/ql/test/library-tests/ControlFlow/evaluation-order/OldCfgImpl.qll	Adapts the existing Python CFG to the signature used by the evaluation-order utilities.
python/ql/test/library-tests/ControlFlow/evaluation-order/MissingAnnotations.ql	Query to detect expressions in test functions that are missing annotations (excluding allowed cases).
python/ql/test/library-tests/ControlFlow/evaluation-order/MissingAnnotations.expected	Expected output for MissingAnnotations (empty).
python/ql/test/library-tests/ControlFlow/evaluation-order/ContiguousTimestamps.ql	Query to check timestamps are contiguous within each test function.
python/ql/test/library-tests/ControlFlow/evaluation-order/ContiguousTimestamps.expected	Expected output for ContiguousTimestamps (empty).
python/ql/test/library-tests/ControlFlow/evaluation-order/ConsecutiveTimestamps.ql	Query to check consecutive CFG annotations have consecutive timestamps.
python/ql/test/library-tests/ControlFlow/evaluation-order/ConsecutiveTimestamps.expected	Expected output for ConsecutiveTimestamps.
python/ql/test/library-tests/ControlFlow/evaluation-order/AllLiveReachable.ql	Query to check live annotations are reachable from scope entry in the CFG.
python/ql/test/library-tests/ControlFlow/evaluation-order/AllLiveReachable.expected	Expected output for AllLiveReachable (empty).
python/ql/test/library-tests/ControlFlow/evaluation-order/AnnotationHasCfgNode.ql	Query to check each non-dead annotation has a corresponding CFG node.
python/ql/test/library-tests/ControlFlow/evaluation-order/AnnotationHasCfgNode.expected	Expected output for AnnotationHasCfgNode (empty).
python/ql/test/library-tests/ControlFlow/evaluation-order/NoBasicBlock.ql	Query to check CFG nodes belong to basic blocks.
python/ql/test/library-tests/ControlFlow/evaluation-order/NoBasicBlock.expected	Expected output for NoBasicBlock (empty).
python/ql/test/library-tests/ControlFlow/evaluation-order/BasicBlockAnnotationGap.ql	Query to detect unannotated gaps within a basic block after an annotated node.
python/ql/test/library-tests/ControlFlow/evaluation-order/BasicBlockAnnotationGap.expected	Expected output for BasicBlockAnnotationGap (empty).
python/ql/test/library-tests/ControlFlow/evaluation-order/BasicBlockOrdering.ql	Query to check basic-block-local ordering constraints between annotations.
python/ql/test/library-tests/ControlFlow/evaluation-order/BasicBlockOrdering.expected	Expected output for BasicBlockOrdering.
python/ql/test/library-tests/ControlFlow/evaluation-order/NoBackwardFlow.ql	Query enforcing “time doesn’t flow backward” constraints between consecutive annotated CFG nodes.
python/ql/test/library-tests/ControlFlow/evaluation-order/NoBackwardFlow.expected	Expected output for NoBackwardFlow.
python/ql/test/library-tests/ControlFlow/evaluation-order/StrictForward.ql	Stricter forward-edge constraint for single-timestamp non-loop consecutive annotations.
python/ql/test/library-tests/ControlFlow/evaluation-order/StrictForward.expected	Expected output for StrictForward.
python/ql/test/library-tests/ControlFlow/evaluation-order/NeverReachable.ql	Query to ensure `never`-annotated nodes are not reachable from scope entry.
python/ql/test/library-tests/ControlFlow/evaluation-order/NeverReachable.expected	Expected output for NeverReachable.
python/ql/test/library-tests/ControlFlow/evaluation-order/NoSharedReachable.ql	Query to ensure shared timestamps only occur on mutually-exclusive paths.
python/ql/test/library-tests/ControlFlow/evaluation-order/NoSharedReachable.expected	Expected output for NoSharedReachable (empty).

Copilot's findings

Files reviewed: 40/45 changed files
Comments generated: 4

+ * Utility library for identifying timer annotations in evaluation-order tests.
+ *
+ * Identifies `expr @ t[n]` (matmul), `t(expr, n)` (call), and
+ * `expr @ t.dead[n]` (dead-code) patterns, extracts timestamp values,


+operands of binary operators, elements of collection literals, etc.)
+
+Every evaluated expression has a timestamp annotation, except the
+timer mechanism itself (t[n], t.dead[n]).


+def _report():
+    """Print summary at interpreter exit."""
+    if not _results:
+        return
+    total = len(_results)
+    passed = sum(1 for _, errors in _results if not errors)
+    print("---")
+    print(f"{passed}/{total} tests passed")
+    if passed < total:
+        sys.exit(1)


+    def __init__(self, timer, elements):
+        self._timer = timer
+        self._live = set()
+        self._dead = set()
+        self._never = False
+        for e in elements:
+            if isinstance(e, int):
+                self._live.add(e)
+            elif isinstance(e, _DeadMarker):
+                self._dead.add(e.timestamp)
+            elif isinstance(e, _NeverSentinel):
+                self._never = True
+


tausbn added the no-change-note-required This PR does not need a change note label Apr 16, 2026

github-actions Bot added the Python label Apr 16, 2026

tausbn force-pushed the tausbn/python-add-self-validating-cfg-tests branch from a8f9f10 to 9f1c5e5 Compare April 16, 2026 20:51

tausbn force-pushed the tausbn/python-add-self-validating-cfg-tests branch from 9f1c5e5 to fa72fc0 Compare April 28, 2026 15:16

github-advanced-security AI found potential problems Apr 28, 2026

View reviewed changes

tausbn force-pushed the tausbn/python-add-self-validating-cfg-tests branch from fa72fc0 to c53bbdd Compare April 28, 2026 15:40

tausbn force-pushed the tausbn/python-add-self-validating-cfg-tests branch from c53bbdd to 70af53b Compare May 12, 2026 12:43

tausbn added 4 commits May 12, 2026 12:54

Python: Add BasicBlockOrdering test

fc2bc26

This one demonstrates a bug in the current CFG. In a dictionary comprehension `{k: v for k, v in d.items()}`, we evaluate the value before the key, which is incorrect. (A fix for this bug has been implemented in a separate PR.)

tausbn force-pushed the tausbn/python-add-self-validating-cfg-tests branch from 70af53b to f5c3b63 Compare May 12, 2026 12:55

tausbn marked this pull request as ready for review May 12, 2026 14:30

tausbn requested a review from a team as a code owner May 12, 2026 14:30

Copilot AI review requested due to automatic review settings May 12, 2026 14:30

Copilot started reviewing on behalf of tausbn May 12, 2026 14:31 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Python: Address Copilot's comments

1ef557c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Add self-validating CFG tests#21724

Python: Add self-validating CFG tests#21724
tausbn wants to merge 6 commits into
mainfrom
tausbn/python-add-self-validating-cfg-tests

tausbn commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tausbn commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tausbn commented Apr 16, 2026 •

edited

Loading